Scheduling Load Operations on VLIW Machines

نویسندگان

Charles R. Hardnett

Krishna V. Palem

Rodric M. Rabbah

Weng-Fai Wong

چکیده

There continues to be a widening gap between processor speeds and memory access time. This gap is seen in systems ranging from embedded computing systems to high-performance supercomputing systems. In this paper, we present an instruction scheduling algorithm that can be targetted towards VLIW architectures commonly found in embedded systems and high-performance workstations i.e. Itanium. The goal of this paper is to present a simple instruction scheduling algorithm that does not require substantial hardware support to address the scheduling of load operations to mask the latency of delinquent loads; which are associated with high miss rates and very long average latencies. Our algorithm is named Cache Sensitive Scheduling (CSS). CSS is designed to be sensitive to the varying memory latencies of load operations, and compensate for those latencies within the instruction schedule by masking the typically long latencies of load operations with useful operations to reduce stall penalties. CSS can extend a rank-function based scheduler with two additional components to intelligently incorporate the profiled average latency of an operation, and the latencies of its predecessors. Our results show that these additional components are effective in generating schedules that are more sensitive to the latencies of load instructions. To support the selection and relative weight of our rank function components we use multivariate statistical analysis to determine the degree of correlation between our rank components and the execution time of the program. In our experiments with a VLIW parameterized compiler-simulator infrastructure using a variety of memory hierarchy configurations; we were able to achieve 20% speedups and 44% stall cycle reductions over a more conventional critical path scheduling algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...

متن کامل

Cache Sensitive Instruction Scheduling

The processor speeds continue to improve at a faster rate than the memory access times. The issue of data locality is still unsolved, and continues to be a problem given the widening gap between processor speeds and memory access times. Compiler research has chosen to address this problem in many directions including source code transformations of loops, static data reorganization, dynamic data...

متن کامل

Optimality of the flexible job shop scheduling system based on Gravitational Search Algorithm

The Flexible Job Shop Scheduling Problem (FJSP) is one of the most general and difficult of all traditional scheduling problems. The Flexible Job Shop Problem (FJSP) is an extension of the classical job shop scheduling problem which allows an operation to be processed by any machine from a given set. The problem is to assign each operation to a machine and to order the operations on the machine...

متن کامل

Optimality of the flexible job shop scheduling system based on Gravitational Search Algorithm

متن کامل

Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making design simpler, it introduces extra overheads by way of inter-cluster communication. This communication ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Scheduling Load Operations on VLIW Machines

نویسندگان

چکیده

منابع مشابه

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Cache Sensitive Instruction Scheduling

Optimality of the flexible job shop scheduling system based on Gravitational Search Algorithm

Optimality of the flexible job shop scheduling system based on Gravitational Search Algorithm

Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors

عنوان ژورنال:

اشتراک گذاری